Use the Climate Data Catalog
Contents
Use the Climate Data Catalog#
Once we generate the catalog in the other notebook, we can use the catalog!
Imports#
import intake
from distributed import Client, LocalCluster
import hvplot.xarray
import matplotlib.pyplot as plt
import holoviews as hv
hv.extension("bokeh")
Spin up a Dask Cluster#
cluster = LocalCluster()
client = Client(cluster)
client
2022-07-21 09:18:38,603 - distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-7e2ejztf', purging
2022-07-21 09:18:38,603 - distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-enzfygne', purging
2022-07-21 09:18:38,604 - distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-gdb25teo', purging
2022-07-21 09:18:38,604 - distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-evnyv1jl', purging
Client
Client-0412cbb2-0900-11ed-9129-acde48001122
| Connection method: Cluster object | Cluster type: distributed.LocalCluster |
| Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
0d31f132
| Dashboard: http://127.0.0.1:8787/status | Workers: 4 |
| Total threads: 12 | Total memory: 16.00 GiB |
| Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-61265159-e2d9-4dbd-8e32-ad04bf8b6cec
| Comm: tcp://127.0.0.1:55489 | Workers: 4 |
| Dashboard: http://127.0.0.1:8787/status | Total threads: 12 |
| Started: Just now | Total memory: 16.00 GiB |
Workers
Worker: 0
| Comm: tcp://127.0.0.1:55504 | Total threads: 3 |
| Dashboard: http://127.0.0.1:55505/status | Memory: 4.00 GiB |
| Nanny: tcp://127.0.0.1:55492 | |
| Local directory: /Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-uyorb34d | |
Worker: 1
| Comm: tcp://127.0.0.1:55511 | Total threads: 3 |
| Dashboard: http://127.0.0.1:55514/status | Memory: 4.00 GiB |
| Nanny: tcp://127.0.0.1:55494 | |
| Local directory: /Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-5r82wbzs | |
Worker: 2
| Comm: tcp://127.0.0.1:55510 | Total threads: 3 |
| Dashboard: http://127.0.0.1:55512/status | Memory: 4.00 GiB |
| Nanny: tcp://127.0.0.1:55495 | |
| Local directory: /Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-nrg7jfnz | |
Worker: 3
| Comm: tcp://127.0.0.1:55507 | Total threads: 3 |
| Dashboard: http://127.0.0.1:55508/status | Memory: 4.00 GiB |
| Nanny: tcp://127.0.0.1:55493 | |
| Local directory: /Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-mnh6sj2t | |
Access the data#
We have an intake catalog we can read in!
data_catalog = intake.open_esm_datastore("catalogs/arise-catalog.json")
data_catalog.search(variable='TREFHT').keys()
['month_1.SSP245-TSMLT-GAUSS-DEFAULT.1',
'month_1.SSP245-TSMLT-GAUSS-DEFAULT.2',
'month_1.SSP245-TSMLT-GAUSS-DEFAULT.3',
'month_1.SSP245-TSMLT-GAUSS-DEFAULT.4',
'month_1.SSP245-TSMLT-GAUSS-DEFAULT.5',
'month_1.SSP245-TSMLT-GAUSS-DEFAULT.6',
'month_1.SSP245-TSMLT-GAUSS-DEFAULT.7',
'month_1.SSP245-TSMLT-GAUSS-DEFAULT.8',
'month_1.SSP245-TSMLT-GAUSS-DEFAULT.9',
'month_1.SSP245-TSMLT-GAUSS-DEFAULT.10']
Load the Data Using Dask#
dsets = data_catalog.search(variable='TREFHT').to_dataset_dict()
--> The keys in the returned dictionary of datasets are constructed as follows:
'frequency.experiment.member_id'
100.00% [10/10 00:01<00:00]
sorted_keys = sorted(dsets.keys(), key=lambda x: (len(x), x))
Investigate our Dataset#
Let’s investigate our dataset!
Plot Using Matplotlib#
We can start with a single time step
for key in sorted_keys:
ds = dsets[key].isel(time=0)
ds.TREFHT.plot()
plt.title(f"{ds.time.values} \n {ds.case}")
plt.show()
plt.close()
And a single point
for key in sorted_keys:
ds = dsets[key].sel(lat=41.8781,
lon=-87.6298,
method='nearest')
ds.TREFHT.plot()
plt.title(f"Chicago, IL \n {ds.case}")
plt.show()
plt.close()
Plot Using hvPlot#
Let’s use an interactive plotting library!
We can start with a single time step
dsets['month_1.SSP245-TSMLT-GAUSS-DEFAULT.1'].TREFHT.isel(time=0).hvplot(cmap='magma')
And a single point
dsets['month_1.SSP245-TSMLT-GAUSS-DEFAULT.1'].TREFHT.sel(lat=41.8781,
lon=-87.6298,
method='nearest').hvplot.line(title='Temperature near Chicago, IL')
WARNING:param.CurvePlot02293: Converting cftime.datetime from a non-standard calendar (noleap) to a standard calendar for plotting. This may lead to subtle errors in formatting dates, for accurate tick formatting switch to the matplotlib backend.